Reference clustering
نویسنده
چکیده
This document describes the process we used to obtain a reference clustering from anti-virus labels to evaluate the malware clustering techniques presented in [3]. This text was included in an early version of [3], but it had to be discarded because of space limitations. Since this reference dataset has been made available and is being used by several researchers, it seems useful to provide detailed information on how it was obtained. 1 Reference Clustering When an anti-virus programs recognizes a malicious programs, it assigns a name to this binary. Typically, virus labels are hierarchically structured. That is, each label has at least one part denoting the malware family and one part describing the particular variant. However, there is no standard naming convention. Each anti-virus vendor creates and assigns its own virus labels. This is why the same file is typically labeled differently by different vendors. Moreover, the granularity of virus labels varies between virus-scanners. Mcafee and Grisoft, for example, have very general labels, where a single name covers a broad range of malware instances. Mcafee also assigns labels such as ‘Generic BackDoor‘ that have neither a variant nor a family part. Kaspersky, on the other hand, consistently uses labels such as ‘EmailWorm.Win32.Zhelatin.jz‘, which allows for the straight forward extraction of a family and variant name. It is possible to cluster a given set of malware samples based on the labels produced by an anti-virus program. However, as pointed out in [2], virus scanners are not particularly well-suited for clustering a given set. First, they
منابع مشابه
Magnetic Calibration of Three-Axis Strapdown Magnetometers for Applications in Mems Attitude-Heading Reference Systems
In a strapdown magnetic compass, heading angle is estimated using the Earth's magnetic field measured by Three-Axis Magnetometers (TAM). However, due to several inevitable errors in the magnetic system, such as sensitivity errors, non-orthogonal and misalignment errors, hard iron and soft iron errors, measurement noises and local magnetic fields, there are large error between the magnetometers'...
متن کاملA New Clustering Technic by the Preferences of the Objective in Data Envelopment Analysis
The ways of placing decision making units (DMUs) in certain clusters are found as a subject in statistics, these ways usually are heuristic. The proposed clustering approach in this article considers preferences of DMUs. This study applies Data Envelopment Analysis (DEA) DMUs are clustered by solving multi-objective linear problem (MOLP) and by considering preferences of each DMU at production ...
متن کاملMoving beyond de novo clustering in fungal community ecology.
High throughput sequencing (HTS) has rapidly become the de facto tool for characterizing microbial community structure in a wide variety of habitats (Caporaso et al., 2011; Peay et al., 2016; Truong et al., 2017). Accompanying the expanding use of HTS to quantify microbial diversity is the need to delineate species, the ecological unit traditionally used to compare the richness and composition ...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملEvent Clustering On Streaming News Using Co-Reference Chains And Event Words
Event clustering on streaming news aims to group documents by events automatically. This paper employs co-reference chains to extract the most representative sentences, and then uses them to select the most informative features for clustering. Due to the long span of events, a fixed threshold approach prohibits the latter documents to be clustered and thus decreases the performance. A dynamic t...
متن کاملFuzzy Agglomerative Clustering
In this paper, we describe fuzzy agglomerative clustering, a brand new fuzzy clustering algorithm. The basic idea of the proposed algorithm is based on the well-known hierarchical clustering methods. To achieve the soft or fuzzy output of the hierarchical clustering, we combine the single-linkage and completelinkage strategy together with a fuzzy distance. As the algorithm was created recently,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010